Skip to content

fix(python): deterministic pip/poetry scans + move smoke fixtures to bomly-dev#177

Merged
bomly-guy merged 5 commits into
mainfrom
fix/python-smoke-determinism
Jun 18, 2026
Merged

fix(python): deterministic pip/poetry scans + move smoke fixtures to bomly-dev#177
bomly-guy merged 5 commits into
mainfrom
fix/python-smoke-determinism

Conversation

@bomly-guy

@bomly-guy bomly-guy commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Problem

The scheduled Smoke workflow fails daily on the Python jobs, and the suite still depended on third-party demo repos (Veracode, Snyk, ljharb, google/uuid). This PR fixes the Python determinism root causes, moves every external scan target onto bomly-dev's own example repos, and adds fixture-based detector tests.

Python drift (distinct from the EPSS fix in #176)

  • scan-python-pip — the pip detector resolves from pip inspect --local; --install-first installed into the ambient CI interpreter, so pip inspect captured the runner's own tooling plus latest-resolved transitive deps → daily drift.
  • scan-python-poetry--install-first bypassed the committed poetry.lock fast-path into poetry run pip inspect, dropping transitive edges (depends_on became []).
  • scan-python-pip-reachability — same ambient-capture cause in the dependency graph (e.g. idna 3.18 instead of pinned 2.8).

Changes

Detector

  1. Isolate pip --install-first in a venv (internal/detectors/python/venv.go): clean, project-scoped virtualenv for both pip install and pip inspect; ambient site-packages no longer leak in.
  2. pip requirements.lock fast-path (internal/detectors/python/piplock.go): build the graph from a committed, pinned, # via-annotated lock — no install, no inspect — mirroring poetry.lock.

Tests

  • New internal/detectors/python/lockfile_integration_test.go mirrors the node detector's fixture-based integration tests: drives each binary-free lock fast-path end-to-end through ResolveGraph against real manifests under testdata/lockfiles/{pip,poetry,uv,pipenv}, asserting package set, transitive edges, single application root, and runtime/development scope. Plus the existing unit tests for the lock parser and venv helpers.

All smoke/audit scan targets → bomly-dev (pinned; each carries a committed lock / go.sum):

  • python: scan-python-poetry drops --install-first; scan-python-pip + -reachabilityexample-python-pip (committed requirements.lock).
  • go: scan-go* / diff-go* / explain-go* / lite-*-goexample-go-gomod (was google/uuid).
  • java: scan-java-maven-reachabilityexample-java-maven.
  • npm: scan-npm-reachability / -scope-runtime / -auditexample-javascript-npm.

Cleanup: scrub residual third-party branding (com.srcclrcom.bomly in a maven detector test; a prose-template note). git grep for veracode|sourceclear|srcclr|snyk|ljharb|nodejs-goof is clean; no scan --url target points outside bomly-dev.

Docs: decision-log entry in docs/ARCHITECTURE.md.

Remaining

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added a Python requirements.lock fast-path to build dependency graphs directly from pinned lockfiles.
    • Improved reproducible Python scanning by using a clean, project-scoped virtual environment for installs and venv-aware inspection.
  • Documentation
    • Updated architecture documentation to explain the Python determinism strategy, lockfile fast-path behavior, and venv isolation.
  • Tests
    • Added unit and integration tests for lockfile parsing/scoping and virtualenv selection/command behavior.
    • Refreshed smoke and detector test fixtures with newly pinned example inputs.

The pip detector resolves graphs from `pip inspect --local`, which reads
the active interpreter's site-packages. Running --install-first as a plain
`pip install` into the ambient interpreter captured unrelated runner tooling
(poetry, build, keyring, virtualenv, date-versioned helpers), so smoke output
drifted daily. Poetry's lock fast-path was also bypassed by --install-first,
dropping transitive edges (depends_on became empty).

- Isolate pip --install-first into a clean, project-scoped virtualenv
  (internal/detectors/python/venv.go) so pip install and pip inspect run
  against only the declared deps, not the ambient environment.
- Add a pip requirements.lock fast-path (internal/detectors/python/piplock.go)
  mirroring poetry.lock: build the graph directly from a committed,
  fully-pinned, `# via`-annotated lock with no install/inspect.
- Drop --install-first from the scan-python-poetry target so the committed
  poetry.lock fast-path runs (stable versions + transitive edges).
- Document the decision in docs/ARCHITECTURE.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds two Python detector determinism mechanisms: a project-scoped virtualenv (keyed by working directory hash) for isolated pip install/pip inspect execution, and a fast-path that builds the dependency graph directly from requirements.lock files without installing packages. Both are integrated into PipDetector.ResolveGraph and PipDetector.Install. Comprehensive integration tests validate the fast-path across pip, poetry, uv, and pipenv lockfile formats. Supporting test infrastructure is updated with new example repository pins and documentation describes the strategy.

Changes

Python Detector Determinism: Venv Isolation + Lock Fast-Path

Layer / File(s) Summary
Project-scoped venv provisioning and pip inspect selection
internal/detectors/python/venv.go, internal/detectors/python/venv_test.go
venv.go implements pythonVenvDir (deterministic hash-based path under os.TempDir()), venvPythonPath (platform-specific executable lookup), pipInspectCommandForProject (prefers venv python with fallback), and createPythonVenv (full venv reset and recreation with error handling). Tests verify directory determinism and stability, platform-specific resolution, and command selection fallback/preference behavior.
requirements.lock parsing and structure
internal/detectors/python/piplock.go, internal/detectors/python/piplock_test.go
Defines lockfile filename constant, pinned-requirement and dev-hint regexes, pipLockEntry struct, and pipLockFilePath location helper. parsePipLock scans line-by-line into entries with indented # via annotations; parsePipLockViaLine classifies via tokens as parent packages or input files. Tests validate multiline via handling and file-path detection.
Graph construction and scope propagation from lock
internal/detectors/python/piplock.go, internal/detectors/python/piplock_test.go
depGraphFromRequirementsLock builds an sdk.Graph with synthetic root, wires edges from # via relationships, and attaches orphan nodes. propagatePipScopes BFS-propagates runtime/development scopes with runtime precedence, defaulting unscoped nodes to runtime. Tests assert node IDs, edge structure, root direct dependencies, and scope resolution across multiple contexts.
Lockfile integration tests across pip, poetry, uv, and pipenv
internal/detectors/python/lockfile_integration_test.go
Comprehensive integration tests validating the fast-path graph construction for four Python lockfile formats. Tests assert expected node presence, transitive edge structure, and runtime/development scope propagation. Pip requirements.lock test validates pinned versions and via-based edges; poetry test checks group-based scopes; uv test verifies scope separation; pipenv test confirms lock-only flat-edge behavior.
PipDetector wiring, Python smoke test, and architecture documentation
internal/detectors/python/pip.go, test/smoke/smoke_test.go, docs/ARCHITECTURE.md
ResolveGraph adds requirements.lock fast-path returning immediately on success, falling back to pipInspectCommandForProject otherwise. Install provisions a project venv via createPythonVenv and uses its python binary for pip commands. The scan-python-pip-reachability smoke test is updated to a new pinned repository. Architecture documentation describes both determinism strategies and their role in reproducible scans.

Test Infrastructure and Documentation Updates

Layer / File(s) Summary
Smoke test re-pinning to new example repositories
test/smoke/smoke_test.go
Four smoke test cases (scan-go-reachability, scan-npm-reachability, scan-java-maven-reachability, scan-npm-scope-runtime) are re-pinned to bomly-dev example repositories with updated commit SHAs. Comments are refreshed to describe new repository characteristics while preserving command shapes and tool requirements.
Lite and audit smoke tests, Maven fixture, and prose updates
test/smoke/lite_test.go, test/smoke/audit_test.go, internal/detectors/maven/detector_test.go, internal/support/prose/README.md
Lite smoke tests (TestLiteScan, TestLiteDiff, TestLiteExplain) and audit tests (TestAuditScan, TestAuditDiffAndExplain) are re-pinned to bomly-dev example repositories with updated refs. Maven detector test updates root artifact groupId from com.srcclr to com.bomly and corresponding DirectDependencies lookup. Prose template removes Veracode-specific reference.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: fixing deterministic behavior in Python scanning (pip/poetry) and moving smoke test fixtures to bomly-dev repositories.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/python-smoke-determinism

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Bomly Diff Summary

Compared 9e7e2fe415f1b3807559b93f235d317166c5d66d to e2b6858ac3904c1afb5bb1e3c25b02bcba54d4d3.

Overview

Status Manifests Dependencies Findings Duration
✅ Pass +0 / ~0 / -0 +0 / ~0 / -0 0 introduced / 0 persisted / 0 resolved 66861ms

Dependency Changes

✅ No dependency changes.

Vulnerabilities

✅ No vulnerability changes.

License Changes

✅ No license changes.

Project Posture

✅ No project posture changes (or --matchers +scorecard was not selected).

Policy Findings

✅ No policy differences were identified.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

bomly-guy and others added 2 commits June 18, 2026 12:02
example-python-pip now ships a fully-pinned requirements.lock, so the pip
detector resolves a deterministic graph via its lock fast-path:

- scan-python-pip: pin to the lock commit and drop --install-first (the
  fast-path needs no install, and installing into the ambient interpreter
  was the source of the drift).
- scan-python-pip-reachability: repoint to bomly-dev/example-python-pip at
  the same commit; its main.py keeps the jwt/django/rsa/requests call paths
  reachability needs, and the committed lock makes the graph stable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Repoint every smoke/audit target that used an external demo repo to the
equivalent bomly-dev fixture (pinned by commit; all carry committed
lockfiles / go.sum so resolution is deterministic):

- scan-go-reachability        -> example-go-gomod (x/text language.Parse
                                 reached via main -> sub3.Baz)
- scan-java-maven-reachability -> example-java-maven (Main.java imports
                                 commons-fileupload / xmlsec / jbcrypt / spring-web)
- scan-npm-reachability        -> example-javascript-npm (js-yaml.load,
                                 direct + transitive via `to`)
- scan-npm-scope-runtime       -> example-javascript-npm (dev dep mocha
                                 excluded under --scope runtime)
- scan-npm-audit               -> example-javascript-npm (vulnerable deps)

Also scrub residual third-party branding from a maven detector test
fixture (com.srcclr -> com.bomly) and a prose template note.

Goldens for the repointed targets need regeneration.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bomly-guy bomly-guy changed the title fix(python): make pip/poetry scans deterministic under install-first fix(python): deterministic pip/poetry scans + move smoke fixtures to bomly-dev Jun 18, 2026
bomly-guy and others added 2 commits June 18, 2026 15:20
Repoint the audit and lite go targets (scan-go-enrich/audit/audit-high,
diff-go-audit, explain-go-enrich, lite-scan-go, lite-diff-go,
lite-explain-go) from google/uuid to bomly-dev/example-go-gomod, matching
the convention already used by the basic scan-go/diff-go/explain-go cases
(tags v1.0.0 / v0.9.0..v1.0.0; explain targets golang.org/x/text). Every
smoke/audit scan --url target now resolves to a bomly-dev example repo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirror the node detector's lockfile_integration_test.go: drive each
Python lock fast-path end-to-end through ResolveGraph against real
manifest fixtures under testdata/lockfiles/{pip,poetry,uv,pipenv}, with
node-style helpers (requirePyPackage / requirePyEdge / requirePyScope /
requirePySingleRoot). Covers the binary-free parsers — requirements.lock
(the new pip fast-path), poetry.lock, uv.lock, Pipfile.lock — asserting
package set, transitive edges, single application root, and
runtime/development scope.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
internal/detectors/python/lockfile_integration_test.go (3)

111-115: ⚡ Quick win

Assert the missing fourth transitive edge in the pip fixture test.

Line 111 says requests has four transitives, but Line 112-Line 115 only assert three edges. Add requests -> charset-normalizer so the lock fast-path edge contract is fully covered.

Suggested patch
 	// requests pulls its four transitive deps via "# via requests".
 	requirePyEdge(t, g, "requests", "2.32.3", "certifi", "2024.8.30")
+	requirePyEdge(t, g, "requests", "2.32.3", "charset-normalizer", "3.4.0")
 	requirePyEdge(t, g, "requests", "2.32.3", "idna", "3.10")
 	requirePyEdge(t, g, "requests", "2.32.3", "urllib3", "2.2.3")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/detectors/python/lockfile_integration_test.go` around lines 111 -
115, The test comment indicates that requests has four transitive dependencies,
but the test only verifies three edges using requirePyEdge calls for certifi,
idna, and urllib3. Add a fourth requirePyEdge call to assert the missing edge
from requests version 2.32.3 to charset-normalizer to ensure the lock file
fast-path edge contract is fully tested.

100-176: ⚡ Quick win

Add doc comments for exported test functions to satisfy repository Go standards.

TestPipRequirementsLockFixture, TestPoetryLockFixture, TestUVLockFixture, and TestPipenvLockFixture are exported and should each have a short doc comment.

Suggested patch
+// TestPipRequirementsLockFixture validates requirements.lock fast-path graph construction.
 func TestPipRequirementsLockFixture(t *testing.T) {
@@
+// TestPoetryLockFixture validates poetry.lock + pyproject.toml fast-path graph construction.
 func TestPoetryLockFixture(t *testing.T) {
@@
+// TestUVLockFixture validates uv.lock fast-path graph construction.
 func TestUVLockFixture(t *testing.T) {
@@
+// TestPipenvLockFixture validates Pipfile.lock fast-path graph construction.
 func TestPipenvLockFixture(t *testing.T) {
As per coding guidelines, `**/*.go`: Every exported type and function must have a doc comment.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/detectors/python/lockfile_integration_test.go` around lines 100 -
176, Add documentation comments to the four exported test functions
TestPipRequirementsLockFixture, TestPoetryLockFixture, TestUVLockFixture, and
TestPipenvLockFixture. Each function requires a doc comment line placed directly
above its declaration that begins with the function name followed by a brief
description of what the test validates. Follow Go's standard documentation
format where the comment starts with the function name as it appears in the
code.

Source: Coding guidelines


190-196: ⚡ Quick win

Make the pipenv test enforce the documented flat-graph behavior and pluggy scope.

The comments on Line 190-Line 193 describe flat lock-only behavior and pluggy’s resulting scope, but the assertions do not verify either. Add explicit checks so this known limitation is regression-tested.

Suggested patch
 	// Scope is re-derived from the Pipfile's [packages] / [dev-packages]:
 	// requests is runtime, pytest is development. pluggy is only a transitive
 	// dependency of pytest, but Pipfile.lock is flat (no edge records it), so it
 	// stays runtime — a known limitation of the lock-only fast-path.
 	requirePyScope(t, g, "requests", "2.32.3", sdk.ScopeRuntime)
 	requirePyScope(t, g, "pytest", "8.3.3", sdk.ScopeDevelopment)
+	requirePyScope(t, g, "pluggy", "1.5.0", sdk.ScopeRuntime)
+
+	requestsDeps, err := g.DirectDependencies(pyStableID("requests", "2.32.3"))
+	if err != nil {
+		t.Fatalf("dependencies(requests@2.32.3): %v", err)
+	}
+	if len(requestsDeps) != 0 {
+		t.Errorf("expected flat Pipfile.lock graph; requests should have no direct deps, got %d", len(requestsDeps))
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/detectors/python/lockfile_integration_test.go` around lines 190 -
196, The test comments in this function describe how pluggy is a transitive
dependency of pytest that stays runtime due to the flat lock-only behavior, but
there is no assertion actually verifying pluggy's scope. Add an additional
requirePyScope call (similar to the existing ones for requests and pytest) that
explicitly checks pluggy's scope is sdk.ScopeRuntime to regression-test this
documented limitation and prevent accidental changes to this behavior in the
future.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/detectors/python/lockfile_integration_test.go`:
- Around line 111-115: The test comment indicates that requests has four
transitive dependencies, but the test only verifies three edges using
requirePyEdge calls for certifi, idna, and urllib3. Add a fourth requirePyEdge
call to assert the missing edge from requests version 2.32.3 to
charset-normalizer to ensure the lock file fast-path edge contract is fully
tested.
- Around line 100-176: Add documentation comments to the four exported test
functions TestPipRequirementsLockFixture, TestPoetryLockFixture,
TestUVLockFixture, and TestPipenvLockFixture. Each function requires a doc
comment line placed directly above its declaration that begins with the function
name followed by a brief description of what the test validates. Follow Go's
standard documentation format where the comment starts with the function name as
it appears in the code.
- Around line 190-196: The test comments in this function describe how pluggy is
a transitive dependency of pytest that stays runtime due to the flat lock-only
behavior, but there is no assertion actually verifying pluggy's scope. Add an
additional requirePyScope call (similar to the existing ones for requests and
pytest) that explicitly checks pluggy's scope is sdk.ScopeRuntime to
regression-test this documented limitation and prevent accidental changes to
this behavior in the future.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ef902472-9262-4b00-b3c8-65bf6f4b5e41

📥 Commits

Reviewing files that changed from the base of the PR and between 229ade8 and e2b6858.

⛔ Files ignored due to path filters (7)
  • internal/detectors/python/testdata/lockfiles/pip/requirements.lock is excluded by !**/*.lock, !**/testdata/**
  • internal/detectors/python/testdata/lockfiles/pipenv/Pipfile is excluded by !**/testdata/**
  • internal/detectors/python/testdata/lockfiles/pipenv/Pipfile.lock is excluded by !**/*.lock, !**/testdata/**
  • internal/detectors/python/testdata/lockfiles/poetry/poetry.lock is excluded by !**/*.lock, !**/testdata/**
  • internal/detectors/python/testdata/lockfiles/poetry/pyproject.toml is excluded by !**/testdata/**
  • internal/detectors/python/testdata/lockfiles/uv/pyproject.toml is excluded by !**/testdata/**
  • internal/detectors/python/testdata/lockfiles/uv/uv.lock is excluded by !**/*.lock, !**/testdata/**
📒 Files selected for processing (1)
  • internal/detectors/python/lockfile_integration_test.go

@bomly-guy bomly-guy merged commit 9cb5b19 into main Jun 18, 2026
13 checks passed
@bomly-guy bomly-guy deleted the fix/python-smoke-determinism branch June 18, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant